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SUMMARY 



The purpose of the project described in this report Ls to study 
the effect of variations in unit size (l.e., amount of material between 
response frames) in programmed instruction on the learner's performance 
on several criterion tasks. Several discrepancies appear to exist be- 
tween linearly programmed materials and the laboratory operant condition- 
ing paradigms after which programmed Instruction is patterned* The focal 
point of these discrepancies centers on the reinforcement aspects of 
programmed instruction and include the problems of lack of research in 
shaping meaningful verbal material and the questionable reinforcing 
qualities of feedback in programmed instruction. 



To investigate these concerns, the frequency of responding was 
varied tn programmed materials and the effects were studied. The 
research was exploratory and the objectives were to answer the follow- 
ing questions! 

(i) If the unit size in linearly programed instructional material 
is varied while all other material-centered variables are 
hold constant, will there be a differential efficiency of 
learning regardless of the individuals Involved? 

(ii) Does the optimal unit size, if any, vary with the content 
of the program? 

(Hi) Does the optimal unit size, if any, vary with the specific 
individual learner? 

* 

(iv) Gan learner-centered variables be identified which will enable 
accurate predictions of success to be made with respect to 
frame size and content? 

Four commercial programs were edited and modified to be similar 
in format and structure. These covered the subject areas of astronomy, 
computer programming, psychology, and statistics. The programs were all 
prepared with 896 frames. Each was prepared in four versions which 
differed only in the frequency (density) of response frames (response 
frames were eliminated by filling in the blanks with the correct response) . 

The Versions are: 

(1) Every frame is a response frame. , 

(2) Every fourth frame is a response frame. 

(3) Every 16th frame i$ a response frame. 

(4) Every 32nd frame Is a response frame. 

The resultant 16 programs (four contents X four versions) ware used In 
a Greco-Latiri square design with tenth-grade high school students. 
Initially 196 students were sampled, but only 180 of these were actually 
enrolled in school. Since students could be excused from the study at 
their parents' request, attrition became a major problem and while some 
data were collected cn all 180, only 45 finished all aspects of the study. 
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The students were randomly resigned to four t rent. men t groups and 
each treatment group worked through four programs representing a 1 3 four 
subject areas and all four versions. Criterion measures were four-week, 
delayed achievement tests in each subject, completion time records error 
score records, and attitude scales. Additionally, data were collected 
on various learner-centered variables including age, sex, mental ability, 
reading ability (3 scores), cognitive style, interest in each subject, 
and background in each subject. 

Because of the attrition problem, a discriminant function analysis 
was used to compare the four resultant treatment groups on the learner- 
centered variables. No significant differences were found either in the 
multivariate case or any of the univariate cases. Thus, it was concluded 
that the randomness of the treatment groups was maintained. 

Discriminant function analyses were run among the treatment groups 
for each of four classes of criterion variables. No significant diffe- 
rences were found for the achievement test scores among the groups. This 
supports the possibility that programmed instruction is discrepant from 
its operant model. As would be expected, significant differences were 
found among the treatment groups on completion time, with the materials 
requiring more responses generally requiring more time to complete. 

Because of the difference in the maximum possible number of errors 
across versions, error scores were converted to the proportion of possi- 
ble errors. No significant differences were found among the groups for 
these converted error scores. Similarly, the groups were not found to 
differ on attitude scale scores. 

The third question asked above as an objective proved to be unans- 
werable within tlie scope of the study. However, it was established that 
strong trends for individual variations across unit sizes did not exist 
where such variations were monotonic and suntmable. 

Multiple linear regression was used for the fourth question and 
remarkably strong predictions were obtained. R 2 ’s ranged from .65 for 
the computer programming achievement test to .81 for the astronomy 
achievement test. In two cases, the obtained R 2, s slightly exceeded 
the maximum R 2, s permitted by the test reliability and in a third case, 
the R 2 was almost as high as the maximum. Strongest single predictors 
were mental ability and the reading scores. 

Recommendations are made to curriculum material developers to con- 
sider abandonment of high response frequency programmed instruction and 
to seek a better approximation to the operant model. Suggested future 
research includes replication of the study under various conditions 
including additional criterion measures. Also, future research should 
study individuals to determine if unit size variations make a differe- 
rence to some people even though the pooled effects show no differences. 
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Chapter I 

PROBLEM AND RELATED RESEARCH 



Problem Statement 

A major trend in American education today is individualized 
instruction in which an attempt is made to match the curriculum to 
the needs and characteristics of each student. While various models 
of individualized instruction are being suggested and employed 
(e.g., IPI, PLAN, PFTME), there is very little known about how to 
match specific mode* and media to individuals and where to begin 
instruction on a specific subject with a particular child. Thus, 
matching instruction to students is still largely an unresolved 
problem in education. 

One of the reasons why educators do not have guidelines for 
matching instruction to the child is because all of the relevant 
variables have not been identified and because many of those which 
are identified are only poorly understood. Another reason is that 
the more basic question of how tailor-made a curriculum must be to 
permit optimum growth in each student has not been resolved. Some 
educators and psychologists tend to believe that a single curriculum 
can be devised to serve for everyone. For example, many programmed 
instruction and contingency management advocates believe that the 
proper application of reinforcement is the crucial point in curri- 
culum and that other curriculum concerns can be minimized or ignored. 
The polar position is that individual differences are great enough 
to require a wide variety of learning experiences to meet the needs 
of various people and that in the extreme each individual should have 
a different curriculum. The resolution of these positions is largely 
an empirical question subject to extensive research. However, it is 
unlikely that a single experiment or even a series of experiments can 
answer the questions since the problems are quite diffuse and the 
possible curriculum-learner combinations are infinite. Single studies 
can shed light on very limited aspects of the problem and can be ex- 
pected to leave more unanswered questions than answered ones. 

In the present study, programmed instruction (PI) has been 
studied in an attempt to identify the relationships or lack of rela- 
tionships between various learner characteristics and certain material 
variables. Programmed instruction frequently has been used as one 
method of individualising instruction* but thorn *r«* no guidelines as 
eg where or how it can best be employed. Skinner (1958) advanced his 
version of PI as an optimum Instructional device for learners In gen- 
eral, but the evidence to date does not substantiate its general super- 
iority over other methods. The research on PI tends to give a variety 



of results on its usefulness and there has been little consistency of 
findings (e.g., see reviews by Silberman, 1962 and Schramm,, 1964). Hence, 
while Stolurow (1963) has recommended that autoinstructional and conven- 
tional methods be alternated in the classroom, there are no rules or guides 
for the implementation of autolnsfcruction unless it is believed superior 
to all other instruction. 

In addition to its role as a potential individualizing agent, PI 
merits study because it permits a greater control over Variables in the 
learning situation than do other, more conventional, methods of instruc- 
tion. This characteristic permits some study of the more basic learning 
phenomena to be undertaken in field research. There are, of course, a 
great number of variables operative in any learning situation and Fry 
(1963) has identified 212 variables in PI. Obviously no one study can 
hope to meanfully. manipulate more than a few of these variables. Else- 
where, the salient independent variables involved in auto-instruction 
have been grouped' into five categories (Flynn, 1968). These categories 
are 

(1) structural or format variables; , 

(2) content variables; 

(3) learner-centered variables; 

(4) structure-learner interaction variables; and 

(5) content-learner interaction variables. 

While some overlap may exist among these categories, the structural 
and content variables are material centered and are independent of th$ 
learner. Similarly, the learner-centered variables are independent of 
the structure and content. The two interaction categories are depen- 
dent upon the learner and upon the materials and involve such charac- 
teristics as the learner* s prior experiences with the particular content 
and structure. 

In the present investigation, variables in four of these categories 
were either manipulated or measured and the fifth category was partially 
taken into account. The variables studied are listed below by category: 

(1) Structural: Amount of materi"! presented between response 

elicitations (i.e., unit size) was varied. 

(2) Content: Programs in four subject areas were used. 

I 

(3) Learner-cfentered : Variables including sex, age, reading 

ability, intelligence, and cognitive style were measured. 

(4) Structure-learner Interaction: No variables were varied or 

measured, but all Ss were given a brief familiarization ' 

period with each type of structure. 
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(5) Content-learner Interaction: Measures were taken of Ss 

Interest In each content area and of Ss prior exper- 
ience with each content area. 

The structural variable of unit size is defined as the amount of 
material presented between elicitations of responses. In one sense 
unit size is a ’’size of step" problem, but since size of step has been 
used in at least five different ways, its use will be avoided in referr- 
ing to the present study. A "unit” as used in the present study can 
contain one or more frames. A unit is terminated by the presence of 
a response frame (i.e., one eliciting a response) and all other frames 
in the unit, if any, are non-response frames. If a unit consists of 
a single frame, then that frame must be a response frame. A frame is 
operationally defined in the present study as a discrete amount of 
material containing one or more sentences and being physically separate 
from other such amounts of material. 

One reason why unit size was selected for stuy is because there 
has been only little evidence supporting the position advocated by 
Skinner (1958) that programs comprised of very short frames provide 
optimum learning for all individuals and consequently various investi- 
gators disagree with this position. For example, Deterline (1967) 
states that steps which are too small ’’can actually interfere with the 
desired learning" (p. 212). Similarly, Pressey, who is considered the 
father of teaching machines, states that "detailed small-step program- 
ming may be exceedingly useful — but surely not for everybody on every- 
thing" (I960), p. 503). 

t 

Part of the problem with short frame programs may be caused by 
several apparent descrepancies between reinforcement theory and the 
mechanics of auto-instruction. First, auto-instruction does not seem 
to fit the operant paradigm as characterized in laboratory studies. 

Since the stimuli and responses both vary throughout a program, the 
student basically must learn the content in PI in a single trial. If 
this is true, the response of a small frame may only be related to 
the content of that frame as has been suggested by Deterline (1967) . 

t 

* Secondly, since responses in PI are elicited by specific cues, 

PI differs from operant studies in which the responses are emitted 
rather than elicited. Because of this, Lumsdaine (1962) has suggested 
that PI may better fit contiguity theory than reinforcement theory. 

A third problem in relating PI to reinforcement theory has been 
suggested by Scandura (1966) who points out that shaping techniques 
have not yet been specified for meaningful verbal material. Consequently, 
the learning principles discovered in laboratory studies involving the , 
modification of overt behavior may not be directly useful in teaching 
symbolic material. 

it 

If these problems did not exist and if PI does adequately parallel 
the operant paradigm, a further consideration is that feedback (i.e., 
knowledge of correctness of the response) may not be reinforcing. Rein- 
forcement is defined to occur when the behavior preceding the reinforcing 
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event has en increased probability of occurence and a basic tenet of 
PI is that feedback is such an event. However, if the learner’s 
response is not correct, seeing the correct response is not analogous 
to a laboratory situation in which an animal fails to emit the correct 
response. In PI, the learner is provided with information by which he 
can covertly correct his error. In the typical laboratory experiment, 
the animal is given no information after an incorrect response except 
that the experimenter or the apparatus fails to respond. 

On the other hand, seeing the correct response does not necessarily 
provide the learner with a reinforcing event. Briggs, et al. (1962) 
report a study in which students who read frames without overtly respond- 
ing had superior test scores and time-efficiency over those who overtly 
responded. These frames were ones of low difficulty based on the degree ■ 
of cueing, and the investigators point out that in this case the overt 
responses with the resultant feedback were not as effective as no res- 
ponses. In related findings, Licklider (1962) reports that highly 
motivated students found scores and conu. 2 nts unimportant in a computer- 
assisted instruction study, while the unmotivated students relied upon 
them. These studies tend to indicate that feedback in PI cannot be 
categorically classified as reinforcing," but that rather, its rein- 
forcing qualities depend upon the learner in relation to the program. 

A testimony to the relative lack of reinforcement in PI are several 
recent studies which have used contingency management as additional 
motivation in PI tasks (e.g., Homme, 1964 and Clements and McKee, 1968). 

If these descrepancies do exist between PI and other manifestations 
of operant conditioning, then the effect of having frequent responses in 
PI cannot be predicted from reinforcement theory. This is especially 
true if feedback is not consistantly reinforcing. 

One approach to studying the effect of responding is to eliminate 
the responses while holding all other variables constant. Silberman 
(1962) summarized 15 studies on the relative effectiveness of overtly 
responding versus covertly responding in PI. Two of the studies found 
overt responding produced higher test scores, four found that covert 
responding produced higher scores, and the remaining nine showed no 
difference. Briggs et cl. (1962) report a study in which one group of 
students worked through linear programs which required overt responding 
while another group worked through the same programs with the response 
blanks filled in. The reading group surpassed the overt responding group 
in the efficiency of time and in achievement test scores for the frames 
which were heavily cued. This suggests that frames which provide easy 
prompts for the student are not as effective a teaching device as read- 
ing the same material. 

Krumboltz and Kiesler (1965) studied the effects of varying the 
frequency of asking questions and of providing feedback on those questions 
in PI. They used six versions of a 177 frame program. These versions, 
presumably arranged in descending order of the amount of reinforcement 
they provide, are* 

X 

1*| The standard program in which a question was asked on every 
frame with the answers provided* 




2. A question on every frame with the anqwer provided for 
every fifth frame. 

3. A question on every frame with the answer provided for 
every tenth frame. 

4. A question only on every fifth ftame with answers provided. 

5. A question only every tenth frame with the answers provided. 

6. No questions asked with all frames consisting of declarative 
statements. 

Achievement test results of the high school students in the sample 
differed significantly in an analysis of variance with scores decreas- 
ing from version 1 through 6, except for version 5. Delayed test 
results (2 months) showed nobignif leant differences. 

There is some indication that the data in the study did not have 
homogeneous variances (see Flynn, 1968) which could produce signifi- 
cance where it does not exist. Assuming, however, that the findings 
are valid, the study generally supports the position that frequent 
responding facilitates learning in PI. However, it also suggests that 
reinforcement is present with or without knowledge of the correct 
answers. 

In another study, Flynn (1968) found that the frequency of respond- 
ing made no difference with tenth and eleventh grade students on a cri- 
terion test. Seven versions of a 864 frame psychology program ranging 
from responses on every frame to no responses at all were used with 
seven treatment groups. While competition times varied significantly 
across groups, scores on an achievement test did not. This suggests 
that in this case responding and its feedback were not important to 
learning (or that reinforcement was present with or without responding) . 

Thus, the research to date has not provided a definite answer to 
the effect of responding in PI. If its effects cannot be predicted by 
reinforcement theory, then the divergent results reported above are to 
be expected. The present study has further explored the effects of 
response frequency. 

£ 

In addition to the structural variable of unit size, the content of 
the programs were varied in the present study. Primarily, this was done 
to determine if the effects of the other variables — such as unit size — 
were content specific or not. Thus, programs in astronomy, computer 
programming, psychology, and statistics were employed.' These particular 
subject areas exhibit some variation in the relation of verbal to numeric 
content and in the degree of abstractness. 

It' is difficult, if not impossible, to vary the content of PI without 
also varying other variables. For example, the type of content often dic- 
tates the type of Illustrative materials employed. Further, different 
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content will contain technical words and jargon in varying frequencies 
and the type of such terms will be qualitatively different. In the pre- 
sent study in which commerical programs were employed as the source of 
the instructional materials, differences will occur in the original 
length of the frames, in the readability of the materials, and in the 
degree to which the material is programed. The latter variable has 
been identified by Holland (1967) and is the extent to which the 
required response is dependent upon the other content in the frame. 

Thus, variations in the content will tend to introduce extraneous 
sources of variation that will confound comparisons across the content 
areas. Although some adjustments for these differences were made in 
the present study, in general differences in criterion variables across 
the content areas must be attributed to all the " 4 !erences among the 
programs and not just to content. 

Various learner-centered variables were also studied. One reason 
for including these variables was to determine if the effectiveness of * 
the PI materials varied as a function of the learner characteristics. 
Secondly, if the effectiveness does vary, the learner characteristics 

can perhaps provide some prediction of that effectiveness. 

•# 

While various learner-centered variables have been studied in 
relation to criterion measures on PI, there is lack of agreement on 
the importance of individual differences in PI. Skinner-type pro- 
grams have been assumed to level out individual differences, while 
branching programs capitalize on them. The empirical data has also 
been undecisive in resolving the problem, with some investigators find- 
ing individual differences to be important in PI while others find that 
they are not. 

The effects of intelligence have been extensively studied in 
relation to PI achievement. Tuel (1566) summarizes some of the research 
in this area and reports that the findings are "somewhat equivocal." 
Kapel (1965) found no relationships between intelligence and PI achieve- 
ment. Shay (1961) found a relationship between Intelligence and error 
rate but not with achievement. Melching (1965) found relationships 
between several measures of intelligence and achievement in PI. Flynn 
(1968) found a significant correlation between achievement test scores 
and ability. Alter (1963) and Tuel (1966) report a relationship between 
intelligence and retention over time in PI. Snelbecker and Downes (1967) 
report that PI reduces but does not eliminate individual differences in 
ability and personality. Stolurow (1961) reviews some of the literature 
on the relation between ability and performance on PI and concludes that 
at that time there was no reason to assume that the same program could 
not be used with students at different levels of intelligence. He cited 
two studies which indicated an interaction exists between ability and 
reward. While these studies were not with PI, the existence of such an 
Interaction could explain some of the inconsistency of findings on the 
effect of ability. 

Various other learner characteristics have been studied in relation 
to performance on PI. Several of these studies are cited here as exam- 
ples of what has been done. Kapel (1965) studied the effects of reading 
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comprehension and found relationships with achievement* Kight and 
Sassenrath (1966)* found anxiety was related to criterion measures 
on PI, but Lache (1967) and Ripple et al* (1965) failed to find 
significant relationships with anxiety. McNeil (1964) found sex 
to be related to achievement, but Filep (1967) found sex was an 
unimportant variablo. Feldhusen and Eigen (1963) found a relationship 
between attitudes and achievement in PI. 

In brief, the evidence is confusing regarding the effect of indi- 
vidual differences on PI. However, there is a long history of find- 
ing individual differences to be important in various endeavors and 
it would be quite surprising if they were not important in PI. 

At least part of the divergent findings in PI studies is proba- 
bly accounted for by poor controls in much of the research. Studies 
frequently involve very short programs, lack ot control over acquisi- 
tion experiences, data which fails to meet the assumptions of the 
analysis techniques employed, inadequate criterion measures, biased 
sampling, and generally poor designs. For example, a review of one 
study (Flynn, 1967) identified 11 principle deficiences in the research. 
Silberman in 1962 stated that research on PI was quite limited and 
that "beyond demonstrating that a carefully written set of materials 
will teach if the student will spend enough time on them, we have 
little unequivocal evidence for the principles of programmed instruction." 
The situation has not improved much since then. 

No structure-learner interaction variables were experimentally 
manipulated or measured in the present stu /, but the students were 
given brief familiarization periods with e ich program prior to work- 
ing on the material on which the criterion measures were based. While 
some of the students had had prior experience with PI they would not , 
be expected to have had experience with the different unit sizes manip- 
ulated in this study. 

The content-learner interaction was taken into consideration by 
measures of the students 1 interest in the particular subject area and 
by measure of their previous experiences with them. In general, PI 
does not make allowances for students’ previous knowledge of content; 
instead all students must begin at the same point and are requested to 
overtly respond even though they may be quite familiar with the content. 
While in general, repeated learning or even overlearning is not con- 
sidered detrimental, compelling the student to respond could impede 
the learning process if PI does parallel animal operant studies , If 
behavior shaping is involved, the knowledgeable student must have exist- 
ing behavior reshaped. The analogous laboratory situation is not 
repeated trials on (i task, but is the reshaping of the animals behavior. 
For example, having a pigeon which was trained to turn circles relearn 
to turn circles would parallel the PI situation. This procedure would 
not only be a waste of time but would undoubtedly interfere with pre- 
viously learned behavior. 

In summary, the problem under consideration in this study is the 
determination of the effect on learning criterion variables of different 
unit sizes in PI. Since learning in general la at least In part a func- 




ticrn of learner characteristics, various learner variables were 
measured and their relationships to the criterion variables were 
studied* 



Objectives 

The purpose of the study is to explore a delimited aspect of 
the general problem of Identifying optimal learning materials for 
given individuals. The specific question being asked is the following: 

Is there an optimal unit size (i*e«, frequency of response 
elicitations) for learning to occur in linearly sequenced auto-instruc- 
tional programs and if there is, is this unit size related to the pro- 
gram content and/or to certain learner-centered variables? 

Following from this question, the objectives of the study are to 
determine answers to the following questions: 

(i) If the unit size in linearly programed instructional material 
is varied while all other material-centered variables are 

. held constant, will there be a differential efficiency of 

learning regardless of the individuals involved? 

(ii) Does the optimal unit size, if any, vary with the content of 
the program? 

(iii) Does the optimal unit size, if any, vary with the specific 
individual learner? 

■ (iv) Can learner-centered variables be identified which will enable 
accurate predictions of success to be made with respect to 
frame size and content? 

The study is exploratory, and no specific hypothesis are being 
tested. , 
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Chapter II 



PROCEDURES 



Sample 

A sample of 196 students was selected from the tenth grade at 
Nova High School in Fort Lauderdale, Florida to participate in the 
study. Nova High School is an experimental school which draws stu- 
dents from throughout Broward County, but most of the students are 
from middle class homes. This school was selected in particular 
because the flexible scheduling permitted students to be scheduled 
into special classes for the study. 

In the initial planning of the study, it was decided to use 
two groups of 80 students each with the second group being a com- 
plete replication of the study. However, in order to compensate 
for expected attrition, the administration of the high school was 
asked to schedule 200 students into classes for the project. The 
tenth grade was selected because the administration stated that the 
schedules of tenth graders were most likely to permit the addition 
of the project as a class. No specific selection criteria were em- 
ployed. The schools* computer scheduling program was instructed to 
attempt to schedule all tenth graders into one of eight sections 
entitled "Nova R" (Nova Research) . Of about 500 tenth-grade stu- 
dents, approximately 400 were scheduled into Nova R. Since this 
was twice the number of Ss required, four sections were arbitrarily 
dropped. The four sections remaining each had 49 students for a 
total N of 196 « However, 16 of these students were not actually 
enrolled in school, so that only 180 students were in the sample. 
Two sections were used in the first trimester of the 1968-69 school 
year and two sections were used in the second trimester. 

Since Nova R was an extra class which the students took in 
addition to their regular courses, the high school administration 
required that students be given the option of not participating in 
the study. While such a procedure creates methodological problems 
it was adhered to and a letter describing the project was prepared 
and distributed to’ the students at the first session of the class. 
At the bottom of the letter was a permission slip to be signed by 
the parents giving or denying the student permission to participate 
A copy of this letter is included in Appendix A. 

Attrition became a serious problem in the study since many stu 
dents were excused at their parents* request and only a total of 45 
students completed all parts of the study (although some data were 
collected on most of the students). The resultant characteristics 
of 'the sample and the causes and effects of the attrition are dis- 
cussed in Chapter IV. 



r 



Materials 

Four commercially prepared auto-instructional programs were selected 
for use in the study. These programs are Introduction to FORTRAN , 

(Plumb, 1964), Analysis of Behavior , (Holland and Skinner, 1961), 

Programmed Astronomy I - The Solar System , (Sullivan and Sullivan, 1963) , 
and Descriptive Statistics - Volume I - A Programmed Text (Gotkin and 
Goldstein, 1964)T* Permission was obtained from each of the publishers 
to modify and reproduce materials as required by the study. 

The criteria used in selecting the specific programs were (1) that 
the contents be ones in which the students would probably not be knowledge- 
able; (2) that the length of the material be long enough to provide sev- 
eral hours of instruction; and (3) that the programs be representative 
of the principles of the Skinnerian approach (i.e.„ small frames with 
overt, structured responses, and linear sequencing). The last criterion 
was included because only programs of this type can be readily adapted 
to the needs of the study. For example, if the programs were branching, 
it would be extremely difficult to combine frames to vary the unit size. 
Also branching programs generally have longer frames and usually require ■ 
multiple-choice responses rather than constructed ones. 

The different subject fields represented by the materials provide 
for variation in the skills required by the students and are different 
enough to appeal to different interests. Further, the programs can be . 
loosely listed on a continuum reflecting their numeric content with 
statistics being most numeric, followed by FORTRAN, astronomy, and psy- 
chology in order. 

Frames were selected from each of the commercial programs and after 
editing, eight hundred ninty-six frames in each content area were used 
for the study. In general, the frames used were taken in sequence 
starting at the beginning of each program. Some editing was done to 
make the programs approximately parallel in construction in terms of 
the length of frames and form of responses. Some of the original frames 
in the commercial programs required the student to select a multiple- 
choice answer. These frames were rewritten to require a structured 
response by the student. Several original frames required multiple 
responses and these were rewritten to require only a single response. 

The programs were edited so that the number of words in each frame 
was at least 5 and no more than 40. Most frequently, this was accomplished 
by combining two or more smaller frames and by spliting larger frames so 
the length criteria were met. Even after this editing, differences in 
frame length existed between the four programs. Table 2-1 gives the 
mean frame lengths in number of words and number of sentences based upon 
a sample of 40 frames from each program. 



1 Brief descriptions of the content of each of the programs are 
contained In Appendix B» 
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Table 2-1 



Mean Frame Length in Number of Words 
and Sentences Based on 40 Frames 



<*• 




Programs 






Astronomy 


Computer 

Programming Psychology 


Statistics 


Mean 


if of Words 


23. S 


27.1 24.6 


19.2 


Mean 


if of Sentences 


1.42 


1.25 1.45 


1.28 



The four programs also can be expected to vary In other character- 
istics. For example, the readability of the programs can be expected 
to vary since there is some variation in the age levels for which each 
of the original programs was prepared. Since there seems to be no 
measure of readability especially designed for use with programmed 
materials, theDale-Chall formula for regular prose was used (Dale and 
Chall, 1948a and 1948b). Five selections were sampled from each program 
and the readability was determined. The formula revealed differences among 
the programs as shown in Table 2-2. Thus, the astronomy and itatlstlcs 
programs should be easier to read than the other two. 



Table 2-2 

Readability of the Four Programs Based 
on the Dale-Ch&ll Formula 



• 


Programs 


* 


Astronomy 


Computer 

Programming 


Psychology 


Statistics 


Mean Raw Score 


6.95 


8.17 


8.33 


6.83 


Corrected Grade Level 


7-8 


11-12 


.11-12 


7-8 



In addition to frame length and readability, the programs probably 
vary from each other in various other ways. For example, the degree to 
which they are programmed according to Holland's definition (1967) and 



as discussed in Chapter I probably varies. Likewise, the amount of 
redundancy and cueing can be expected to vary. However, it would be 
extremely difficult to have programs vary in content and not in any 
other characteristics. Consequently, as mentioned earlier in this 
report, differences in student performance among the four programs 
cannot be attributed solely to content, but are a function of all of 
the differences which exist. 

The first ninty-six of the frames in each program were used as 
introductory material to familiarize the students with the particular 
version and content and were not used as a base for collecting data. 
The remaining 800 frames were used for the study and provided the 
instructional situation from which data were . collected. The intro- 
ductory materials were presented in booklets separate from the other 
materials. The two booklets were designated Part I and Part II 
respectively for each subject area. In addition to frames, both 
Part I and Part II booklets contain several , ’Exhibits , ' , (i*e., graphs, 
tables, pictures, explanatory materials, etc.). Originally only the 
psychology program contained exhibits per se, and the illustrative 
material in the other programs had to be grouped to form the exhibits. 

Each of the four programs were rewritten into four versions with 
all variables except unit size per response held constant. These 
versions are: 

(1) Small frames as originally written. 

(2) Material of four of the smaller frames combined* 

‘ (3) Material of 16 of the smaller frames combined* 

(4) Material of 32 of the smaller frames combined. 

For illustration, the following are four adjacent original frames 
from the psychology program. 

In withdrawing the hand from the hot surface* 
arm movement is a response which is elicited 
by a painful to the hand. 

In the hand-withdrawal reflex, the stimulus must 

be intense enough to exceed the or no response 

will occur. 

A light flashed into the eye elicits constriction of 
the pupil. This sequence is called the pupillary * 

In the pupillary reflex* a flash of light is said to 

the response. 

These four frames are used verbatim in Version 1 of the 

psychology program* but for Version 2 they were modified as 
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follows : 



, In withdrawing the hand from the hot surface, arm 

movement is a response which is elicited by a painful 
stimulus to the hand. 

In the hand-withdrawal reflex, the stimulus must be 
Intense enough to exceed the threshold or no response 
will occur. 

A light flashed into the eye elicits constriction of 
. the pupil. This sequence is called the pupillary 
reflex. 

In the pupillary reflex, a flash of light is said 
to the response. 

Tne only change for Version 2 is that the responses have been 
filled in for the first three original frames. Consequently, only 
one response is now required for the four frames combined. The 
original spacing and format is maintained so that the only change 
is the unit size or frequency of responding. In the subsequent 
versions, responses only occur for the last original frame in each 
of the composite frames. 2 

The particular versions used were selected to maximize differences 
in the criterion variables based on the findings of a previous study 
(Flynn, 196$). The previous study used the same basic psychology pro- 
gram (but with 64 additional frames) as the present study with seven 
different versions of unit size. These versions were: 

j 

(1) Small frames as originally written used as units. 

(2) Material of two of the smaller frames combined as an unit. 

» 

(3) Material of four of the smaller frames combined as an unit. 

(4) Material of eight of the smaller frames combined as an unit* 

(5) Material of 16 of the smaller frames combined as an unit* 

(6) Material of 32 of the smaller frames combined as an unit. 

(7) Material of all of the smaller frames combined so that no 
responding is required. 

The last version (Version 7) was eliminated from the present study 
because it may be qualitatively different from the other versions since 
responding is not required. Figure 2-1 graphically summarizes the find 
ings by version on the earlier study (see discussion in Chapter I) . 



2 Examples of each of the four versions of the psychology program 
are contained in Appendix B. 



Figure 2-1 
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No overall significance was found among the achievement test scores, 
but the completion times did differ significantly. From these find- 
ings, the following decisions were made: 

(1) Version 1 (original frames) was selected because 

a) it is the basic unit and is most comparable in format to 
the commercial programs. 

b) the Version 1 treatment group had the highest achievement 
test mean. 

c) the Version 1 treatment group had the highest completion 
time mean and differs significantly from all other groups 
except the Version 2 group. 

(2) Version 3 (four frames combined) was selected because the 
Version 3 group 

4 , 

a) represents a low on achievement test means. * 

(3) Version 5 (16 frames combined) was selected because the 
Version 5 group 

* 

a) represents a low on achievement test means* 

b) represents the lowest completion time. 

(4) Version 6 (32 frames combined) was selected because the 

* Version 6 group 

‘ j 

\ a) represents the highest point on the right hand side of 

the U shaped completion time curve (after Version 7 
was eliminated) . 

b) represents the highest achievement test score on the 
right hand side of the U shaped curve. 

While, these findings were obtained with only the psychology pro- 
gram, they are the only relevant empirical guidelines available. Con- 
sequently, they were used as the basis for selecting the experimental 
versions of material in all four content areas* 



Design 

Since various learner characteristics could affect the students 1 
performance on the different unit sizes, a repeated measures design was 
employed in which all students worked with each unit size* Using 
repeated measures minimizes the need to assure comparability among 
the treatment groups on various variables (the sample is small enough 
that random assignment to groups would probably not ensure the uniform 
distribution of the relevant characteristics)* 
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Since the students could not work through the same materials 
repeatedly, different contents were used with each student for each 
unit size. To allow for possible ordering effects of the unit sizes 
and the contents, a Greco-Latin square design was employed as illus- 
trated In Figure 2-2. 



Figure 2-2 

Greco-Latin Square Design Employed 
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Initially, it was planned to use the square shown in the figure 
with 80 students in the first trimester and to use a second, orthogonal 
square with the second group of 80 students in the second trimester. 
However, the high attrition rate in both trimesters made that unfeasible 
and a single Greco-Latin square was used with the subjects in both tri- 
mesters (see Chapter IV for a discussion of the attrition) . 

If the order of the presentation of materials is a significant 
factor in the students* achievement, then the use of a single Greco- 
Latin square might give biased results. The use of the second square 
would have better permitted the detection of an ordering effect. The 
most likely expectlon is that the sequence per se is not particularly 
Important, but that the students* attitudes and their degree of motiva- 
tion might have changed as they progressed through the study. If such 
changes did consistently occur, then the students* performances on the 






criterion measures should reveal them. For example, if the students 
became increasingly bored as they progressed, then their performance 
on each successive program should decline in each of the treatment 

groups. This problem is more thoroughly examined in the chapter on 
the analysis. 



All four sections of Nova R meet for the first two classes in 
the first trimester while the project was explained and some initial 
data were collected (see below). On the second day, two sections 
were arbitrarily dismissed until the second trimester and the high 
school gave these students free time during the first trimester. 

After the initial dropout from the sample was completed (i.e«, 
those students whose parents initially requested they not participate 
and those students who were not actually enrolled in school) the 
remainder of the students in the two sections in the first trimester 
were randomly assigned to one of four treatment groups corresponding 
to the columns in Figure 2-2. In the second trimester, the students 
who remained after the initial dropout were randomly assigned to the 
same four treatments so as to even out the, groups. 

The first few days of the classes were used to explain the project 
and to collect initial data (see next section) on the students accord- 
ing to the following schedule: 

Class 1: Pass out and explain parental letters. 

Have students complete information sheets. 

Class 2: Dismiss two sections and have remaining 

students complete subject background 
questionnaires, High School Curriculum - t 
Survey, and Pick Two Pictures Test* 

Class 3: Administer Henmon-Nelson Test. 

Class 4: Administer Nelson-Denny Test * 

Class 5: Begin Students on materials. 

Students who were absent on any of the data collection days were requested 
to complete the instruments which they missed at a later time. 

Monitors were employed to supervise the classroom and to distribute 
the materials and tests to the students throughout the trimester. The 
students worked independently without monitor assistance. The classes 
at Ttova High School meet on an every other day basis, so the Nova R 
sessions alternate between a two-day week (Tuesdays and Thursdays) and 
a three-day week (Mondays, Wednesdays, and Fridays). There were 37 class 
days in the first trimester and 33 in the second. 

External contingencies were not provided to motivate the students' 
performances and although the students were promised a grade for their 
participation, the grade could not become a part of their high school 
record. When students finished the study y they were excused from N6va R 
and were given free time. 



Learner-Centered Variables 



Data on a number of independent variables were collected on the 
students. The variables on which these data were collected are listed 
below together with the instruments employed and the rationale for 
collecting them. The Information Sheet which is listed refers to a 
brief questionnaire the students completed on the first day of the 
class. The construction of the instruments which were developed 
especially for this project is described in Chapter III and copies 
of these are included in Appendix C. 



Sex. Each student recorded his or her sex on the Information 
Sheet . That sex differences influence learning has frequently 
been documented in various situations. In programmed instruc- 
tion, for example, McNeil (1964) found that kindergarten boys 
surpassed girls in PI on reading, but Filep (1967) found that 
sex was not ah important variable with junior high school stu- 
dents on an auto-instruction task. 



2* Chronological Age . Each student recorded his month and year of 
birth on the Information Sheet . Since all of the students were 
in the tenth grade, age is fairly homogeneous within the sample. 
However, within the narrow range of age in the sample, the rela- 
tionship of age to the criterion measures is examined. 



3. Number of Years at Nova . The number of years previously in at ten 
dance at Nova High School was recorded by the students on the 
Information Sheet . This information was collected to determine 
if prior experiences in an individualized curriculum are related 
to the criterion measures. However, this is not considered to be 
an important variable. 



4. Courses in which Currently Enrolled. The students listed the 
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courses in which they were 

Sheet . Thus, information about the curriculum track the students 
were following was available and its relationship to the criter- 
ion measures was explored. Further, students who were currently 
enrolled in courses similar to the subjects covered by the pi ma- 
terials could be identified. 



Reading Comprehension and Vocabulary . Because PI is primarily a 
verbal task, the students r reading ability can be expected to 
play an important role in achievement. For example, Kapel (1965) 
found a relationship between reading comprehension and achieve- 
ment in programmed learning. 

The Nelson-Benny Reading Test published by Houghton-Mifflin 
was used to measure reading comprehension, vocabulary, and read- 
ing rate. Alternate form reliability for comprehension is .81 
and for the other scores it ranges between .92 and .93. The 
test has received good reviews in Buros* Mental Measurement 
Yearbook * Form A was used in the study* 



6 . Intelligence. Mental ability by definition should bear a 

strong relationship to most cognitive learning tasks. While 
some researchers claim PI should eliminate intelligence diffe- 
rences) others have net found this to be the case. Shay (1961) , 
Alter (1963), and Tuel (1965) are among the investigators who 
have reported finding relationships between intelligence and 
achievement to PI. 

The instrument employed was the Henmon-Nelson Tests of 
Mental Ability (Revised Edition) which is a paper-and-pencil 
test published by Houghton-Mif flin. The publishers report 
an odd-even reliability of .95 and .94 for forms A and B 
respectively and an alternate form reliability of .89. This 
test also has received good reviews in Buros. Form A was 
used in the study. The raw score was used in the study rather 
than an IQ score because the Interest was in mental ability 
independent of chronological age. 

7. Interest in Subject. An interest inventory was developed to 
determine the students relative interest in the four subject 
(content) areas used in the study • Interest has been shown to 
be related to learning in programmed instruction in some situa- 
tions, (e.g., Campbell, Bivens, and Terry, 1963). In addition, 
it is an everyday observation that a person who is interested in 
a subject will pursue it more vigorously than a person who is 
not interested. 

The instrument is entitled High School Curriculum Survey, 
and its development is described in Chapter III. Four scores 
are given by the instrument showing the relative interest held 
by the student in each of the four content areas of the programs. 

8. Cognitive Style* Cognitive style refers to an individual's pre- 
ferences in organizing and categorizing his perceptions and con- 
cepts of his environment. Some people are found who tend to 
look at the details in situations while others tend to look at 
the wholes. Since programmed instruction is an analytic approach 
to learning, the cognitive style of an individual should have 
some influence on how he achieves in this medium. This is 
especially true in the present study since the unit size vari- 
able influences the degree to which the materials are analytic. 

The instrument which was used is the Pick Two Pictures form 
developed by Kagan, Moss, and Siegel (1963). This instrument 
consists of a series of 19 groups of three pictures each. The 
student is instructed "to pick out two of the pictures that are 
alike in some way," in each group and to "write your reason 
for picking these two pictures." There are no correct answers, 
and usually any two of the three pictures may be selected on 
some logical basis. However, if pictures are selected because 
of similar detail, the student is considered to be analytic in 
his choice; if they are selected because of non-analytic charac- 
teristics, the student is considered to be global in his choice 
(Kagan and his associates identify three conceptual categories 



and here "Inferential-categorical” and "Relational" have been 
combined as non-analytic) . 

The usual practice in scoring the instrument has been to 
conclude that the person is analytic in his thinking if a 
majority of items were chosen on an analytic basis, and to 
conclude that he is non-analytic if the reverse situation 
prevails. The scoring was modified in the present study by 
using the actual number of analytic choices as continuous 
score, rather than using a dichotomy* 

Prior Knowledge of Subject . What a student knows about a 
particular subject when ho begins a new course has an effect 
upon his performance in the course. If he knows most of the 
materials that will be covered, he can probably get through 
the course with a minimum of effort. If the course is struc- 
tured for maximum growth and if the student is motivated, a 
student who knows a great deal about a particular course can 
greatly advance his knowledge beyond that of a naive student. 

If a student knows only a little' or nothing at all about a 
course, he will probably have to expend more effort than the 
student who is more knowledgeable, and he ia not likely to 
advance as far as the student who has prior knowledge. Further 
as discussed in Chapter I, a particular problem may exist in 
PI with knowledgeable students if behavior shaping is involved 
as has been theorized by some. 

While it Is important to know how much the students know 
about the content prior to the study, measurement of knowledge 
in a particular area prior to a course in that area may estab- 
lish expectations for the students about what is important in 
the course o While this phenomena creates no problem in the 
normal classroom situation— infact it would probably improve 
the learning process — it does create a problem in research 
which attempts to evaluate the effectiveness of a given met- 
hod of instruction. If gain scores per se are r.ot a concern 
In the study, direct measurement of existing knowledge through 
a pretest can often be omitted. However, the decision not to 
use gain scores must be justifiable and some other method 
should be employed to assess the existing knowledge. 

In the present study, partial control of existing know- . 
ledge in the content areas which were employed in the study 
ha*?; been effected through the inclusion of subject matter • 
which is largely novel to high school curriculum. While the 
contents of the four programs to be employed (descriptive 
statistics, astronomy, psychology, and computer programming) 
are sometimes taught in high schools, they are usually elec- 
tives which only a eraall number of students take. Also, they 
are usually taught in the last two years of high school, so 
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that the sample of tenth graders was not likely to have had 
courses In any of these fields. 3 For thi 3 reason, gain 
. scores and the direct measurement of existing knowledge can 

be omitted with only a minimum of error, thus avoiding the 
problems of pretesting. 

Nevertheless, the possibility exists that some of the 
students did have prior experiences in one or more of the 
four topics, if not through formal instruction, then through 
informal reading, mass media, or association with people in 
these fields (e.g., parents). In order to take any existing 
knowledge into account in analysing the data of the present 
study, four instruments were developed which indirectly 
determine prior experiences within each of the content areas, 
without establishing students* expectations for the course. 

The development of the Instruments is described in Chapter III. 
The instruments are entitled, Astronomy Questionnaire . Computer 
Programmin g Questionnaire , Psychology Questionnaire , and 
Statistics Questionnaire . 

Two additional learner-centered variables were originally proposed 
for the present study. These were dogmatism as measured by the Rokeach 
Dogmatism Scale and school motivation as measured by the Jim Scale . How- 
ever, these tests were not administered because of a standi taken by the 
Broward County School Board against t<asts of a "psychological" nature, 
oubsequent to the submission of the proposal for this project, the Broward 
County School Board adopted a strong position against psychological and 
attitudinal testing. Part of the testing policy adopted reads as follows! 
No tests other than standard intelligence aptitude and achievement test 
shall be scheduled or administered without the expressed consent of the 
board. n The policy also provides that if a test is approved, parents must 
be notified by mail on school board letterhead of the testing and must be 
given the opportunity to inspect the instrument prior to testing. Because 
oz these restrictions the Rokeach Dogmatism Scale and the Jim Scale were 
submitted to the Superintendent’s office for clearance. It was informally 
decided by that offifce that the tests would not receive Board approval. 
Consequently, these two tests were regretfully dropped from the data 
collection effort. 

Criterion Measures 

Several different criterion measures were used in the study. 

These are listed below. 

Completion Time . The classroom monitors maintained logs of 
the time each student had the programmed materials. While 
these times are regarded as completion times, they do not 



%ova High School does offer an elective course in computer 
programming and several students in the sample had either previously 
taken it or were concurrently enrolled in it. This problem is dis- 
cussed in the analysis section. 
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necessarily reflect the amount of time the student actually' 
worked with the materials, but merely the amount of time for 
which he had them in his possession. Sometimes the monitors 
were aware that a student was not working on the materials 
and deducted the time from the record. 

2* Achievement Test . Achievement tests were developed for each 
of the four subject areas covered by the programmed material. 

The tests are in a multiple-choice format and involve recogni- 
tion recall. How the tests were constructed and their relia- 
bilities are discussed in Chapter III. Each test was given 
four weeks after the student completed the corresponding pro- 
gram (see discussion below). 

3. Response Error Rate . The number of errors made in responding 
to the response frames in each program was recorded by each 
student. Record sheets were provided on which the student 
wrote each solicited response. They were instructed to mark 
incorrect responses with a large X and these X*s were later 
tallied by the student to indicate the total number of errors. 
While this method allows room for students to falsify their 
records, it has been used elsewhere (e.g., Flynn, 1968 and 
Kapel, 1965). The instructions stressed that the students 
would not be penalized for the number of errors made. 

Error rate indicates the difficulty of the material for 
the learners involved and in a programmed instruction study 
has little value as an ultimate criterion of effectiveness. 
Rather, its importance is in terms of how it relates to the 
other criterion measures. 

i 

4. Attitude Scales. Student attitudes were recorded using brief » 
questionnaires containing Likert-scaled items. A single ques- 
tionnaire was prepared in four versions to reflect the four 

. content areas. The development of these questionnaires is 
described in Chapter III, 

Two additional criterion measures were originally proposed but were 
not used. These were immediate tests and transfer tests. Each achieve- 
ment test was to have an alternate form developed, with one form to be 
used to measure immediate achievement and the other for delayed reten- 
tion. The decision not to develop two forms was due to the change in 
timing of the study. As planned, the Initial phase of the study in which 
the instruments were prepared would have coincided with the first part 
of the school year and would have made available independent samples of 
students to pretest the instruments. As funded, however, the initial 
phase occurred during the summer and students were not readily available 
to use for instrument pre-testing. Consequently, in the absence of a 
pre-study sample to empirically equate the alternate forms of the achieve- 
ment tests, it was decided not to develop them* 

Had the alternate forms been developed, it had been planned to 
divide the treatment groups into at least two subgroups, and preferably 




four, in order to vary the presentation order of the alternate forms. 
(The high attrition rate would have prevented the subgrouping, any- 
way, had the other tests been developed). 

The decision to use the achievement tests as delayed retention 
tests rather than as immediate tests was made for two reasons. First, 
the previous study with psychology materials (Flynn, 1968) showed no 
significant difference among the treatment groups on an immediate test. 
Second, the Intent of curriculum materials is usually to provide for re- 
tention of content over time as opposed to immediate achievement only. 
Hence, the results using a delayed testing are of more practical value 
and Interest than are results using only an immediate test. 

The decision not to use the transfer tests was made for a different 
reason. These tests were conceived to consist of items which would re- 
quire the student to apply and interpret the content of the prograr * to 
problems which were not contained in the programs themselves. This was 
not done because the original estimates for the time and difficulty in- 
volved in developing such tests were not realistic. After an initial 
attempt, it was decided that the time and labor involved would be dis- 
proportionately great compared to the duration of study and that the 
tests could probably not have been developed in time for use when the 
first students would have been ready to take them. 



Chapter III 

INSTRUMENT DEVELOPMENT 



Several Instruments were developed for use in the study. These 
include both criterion instruments and learner characteristic instru- 
ments. The development of these is described in this chapter and 
copies of the instruments are contained in Appendix C. 

In the proposal for the present study, plans were outlined to 
pretest the various instruments on an independent sample of students. 
This was feasible when the proposal was submitted, because the timing 
was proposed to be such that instrument development could take place 
during the first part of the regular scuool year wnen students would 
be available. However, with a subsequent delay in funding, the first 
phase of the study did not coincide with the school year. Consequently, 
the pretesting of instrumentation was omitted because of the appropriate 
samples of students were not available. Therefore, measures of relia- 
bility and validity are not available independent of the students who 
were engaged in the research (except for the Psychology Achievement 
Test as noted below) . 



High School Curriculum Survey 

An instrument was developed to measure the students' relative 
interest in each of the four content areas of astronomy, computer pro- 
gramming, psychology and statistics. The interest, however, is mea- 
sured in terms of students' reactions to the names of courses rather 
than to any specific content. The interest instrument — named High 
School Curriculum Survey— consists of 140 pairs of course titles 
(randomly arranged) and the student is instructed to "select the 
course in each pair which you THINK you would prefer to take" (the 
complete instructions are in Appendix C) . The Instrument is scored 
for each of the four content areas by counting the number of times 
that the course title corresponding to that content area was selected 
as preferred. The title of each of the four content areas appears in 
15 different pairs so that the maximum scores is 15 for each area. 

In addition to the four areas of astronomy, computer programming, 
psychology, and statistics, 21 other course titles are used in the pairs. 
Twelve of these titles together with the four content areas of interest 
are combined in all possible pairs which accounts for 120 pairs. Conse- 
quently, the four content areas are included in 54 pairs including the 
six pairs which only involve the four. One pair of the 120 was acci- 
dently repeated (this pair did not Involve any of the four content areas) 
which leaves 19 pairs (of 140 total) made up of the remaining 9 course 
titles. These 19 pairs were randomly selected and do not Involve any of 
the four content area course titles. Consequently, a base of 120 pairs 




consisting of the possible combinations of 16 course titles make up 
the nucleus of the instrument. This nucleus, then, permits each 
course to have a unique rank if the student is consistent in his 
choices . 

The course titles included as buffer items in the nucleus of 
120 pairs were selected within the following general specifications: 

1* Courses are included which are novel to the curriculum since 
the four content areas were in part selected because of their 
uniqueness in the curriculum. 

2. Courses which are common to high school curriculums are also 
included to provide an adequate comparison base. 

3. Courses which are included cover a variety of discipline 
areas and include courses which are in the same general 
disciplines as the four content areas. 

4. An attempt was made to vary the conceptual level of the 
courses which are included to provide a range of comparisons. 

Using the first three criteria, a list of 50 courses (including 
astronomy, computer programming, psychology of learning, and statistics) 
was compiled. The course titles used were largely taken from a list 
of courses offered by a high school (Criterion 2) and from several 
undergraduate college catalogs (Criterion 1) . 

For the fourth criterion, conceptual level was arbitrarily defined 
in terms of two dimensions. These dimensions are a verbal-numeric con- 
tinuum and a concrete-abstract continuum. Other dimensions could haye 
been employed instead which also would have provided a basis for ensur- 
ing a range in the nature of the courses included. The list of 50 course 
titles was given individually to five judges who were first asked to 
’’rate each course according to the content which you think is implied by 
the- title” on a five-point continuum from verbal to numeric. Then the 
saws five judges were asked to rate the courses on a five-point continuum 
from concrete to abstract. 

The ratings of the judges were used to reduce the list of course 
titles according to the following criteria: 

1* Courses were selected when the judges' ratings were in com- 
plete consensus at any of the five-points on the scale. This 
provided a range on each of the two dimensions made up of 
titles which evoke similar concepts of content in different* 

• -*. people in terms of either the verbal-numeric dimension or the 

concrete-abstract dimension. 

2* Courses which had mean ratings essentially identical to each of 
the four -content areas were selected. Courses were Included in 
this category if their mean rating fell within 1/2 point of the 
mean of any of the four content areas on both continuums (with- 
out regard to variance) * 
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3. Courses for which the ratings were distributed identically 
to the ratings for any of the four content areas on either 
continuum were selected. “ 

Criteria 1 and 2 both provided more courses than were needed and 
selection of particular courses was made on other bases such as selec- 
tion by both criteria 1 and 2 or by distribution across discipline. In 
some cases* * the choice was essentially random. 

- The number of courses to be included was arbitrarily set at 16 in 
an attempt to make the instrument long enough to get some spread among 
the rankings but not so long that it would be extremely boring or would 
require an excessive amount of time to complete. The number probably 
could have been selected as 14 or 18 or some other similar number with- 
out an appreciable difference iii results. However, n courses form 
n(n-l)/2 pairs so the list becomes unmanageably large very quickly. 

The additional 9 course titles wert included to provide some variety 
to the pairs — so that the student would not have to read the same 16 titles 
repeatedly without variation. The 19 pairs represented by these 9 course 
titles were randomly selected. They include only 2 pairs made up solely 
from the 9 titles and 17 pairs made up of one member from the 9 and one 
member from the twelve other course titles (excluding the four content 
areas) . 

Since complete consistency in picking the preferred member of each- 
pair will result in unique rankings for each of the four content areas, 
the frequency of ties in rankings should provide an indication of how 
consistent the students are in their selections. A total of 103 stu- 
dents in the sample completed the instrument, and of these, 33 or 32.0% 
had tied rankings for two or more of the four content areas. There are 
65,536 ways in which 4 things can be assigned ranks from 0 to 15 and 
21,856 or approximately 33.3% of these involve ties. 1 Thus, if the four 
contents had been ranked randomly, 33.3% could have been expected to be 
tied. The 32.0% actually tied is not significantly different from 33.3% 

(Z < 1) which tends to indicate a lack of consistency in the ratings. 

* 

A possible explanation for the lack of consistency is that some 
students knew they were going to withdraw from the project and did not 
take the instrument seriously. If this is true, students who stayed in 
the program should have fewer ties than those who withdrew. Fifty-four 
of the students who completed one or more programs had 13 ties or 24.1% 

(1 student did not complete the instrument) . Forty-nine students who 
dropped out of the program had 29 ties or 59.2%. This difference yields 
a X 2 of 3.31 which is not quite significant at the *05 level. However, 
the difference is large enough to suggest that those students who dropped 
out tended to be less consistent In picking courses than those who fin- 
ished one or more programs* 



^he total number of ways Is n 1 * or 16 1 * and the number of ties is 
n^-nCn-l) (n-2) (n-3) 
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Of the 54 students who finished one or more programs , only 44 
finished all four programs and completed the Instrument. Considering 
these 44, only eight or 18.2% had ties in rankings. When this figure 
is compared against the 29 ties among the 49 students who dropped out 
without completing any of the programs * a x 2 of 5.54 is obtained which 
is significant (p<.05, df ■ 1) • Thus, it appears that the consistency 
of the instrument is related to the students* completion status. This 
Indicates that the High School Curriculum Survey can be used as a pre- 
dictor of criterion variables for students who completed the study, but 
should not be used to discriminate between dropouts and non-dropouts. 

A test-retest reliability check was not made although such a check 
would give a good Indication of the stability of the items over time. 

Further, the stability of each pair could be examined and unstable pairs 
could be eliminated. 

The validity of the instrument must be considered in terms of content 
validity. It is obvious that a variety of contents could be actually 
offered under each of the course titles Included in the instrument. Thus, 
a student’s preference for a particular course title does not necessarily 
mean that he would actually like such a course were he enrolled in it. In 
cases of the courses which are common to high school curriculums (algebra, 
general physics, etc.) many students have had previous experiences with 
them and can pick or reject them from an experiential base. This is not 
true, however, in the case of the less common courses (such as aerodyna- 
mics, practical logic, etc.). Here the student mu 3 t respond in terms of 
his mental image of such courses haded at best on heresy, popular press, 
and vicarious experiences. This was the main reason course titles were 
Included in the final Instrument when the five judges agreed on the point 
at which they fell on either of the two continuums (Criterion 1) — to have 

course titles which evoked similar conceptions of content from different 
individuals . 

Since some of the course titles are not regularly included in High 
School Curriculums, if is quite conceivable that at least some of the 
students did not know what the title meant and had no idea of the nature 
of the course content. In fact, some students asked what some of the 
course titles were while they were taking the instrument (they were not 
told the meanings since everyone was not given that opportunity) . The 
extent to which students did not know what the titles meant would influence 
both the reliability and validity of the instrument. Some of the lack. of 
internal consistency indicated by the high frequency of tied rankings re- 

ported above, may be due to the students' bslng unfamiliar with some of 
the course titles. 



Prior Knowledge of Subject 

As discussed in Chapter II, it was desired to evaluate the amount of 
background the students had in each of the four content areas without 
using a regular achievement test. The instrument developed consists of 
ten questions* and are the same for each content area except for changing 









the name of the content area. The questions can be answered with a 
simple "Yes” or "No.” 

The questionnaires are scored by assigning a numeric weight. of 
1, 2, or 3 to those answered yes according to their relative signifi- 
cance. The weights assigned are arbitrary and could have been assigned 
differently. The questions from the psychology questionnaire are shown 
below with the numeric weights for each. 



Weight 



3 

3 



2 
1 

5. Have you ever visited a psychological laboratory? 1 

6. Have you ever talked to a psychologist about psychology? 2 

7. Have you ever watched a psychologist work? 1 

8. Is your father or mother a psychologist? 3 

9. Do you ever talk about psychology at home with , 

members of your family? * 

10. Do you know what a psychologist does? 1 

No attempt was made at establishing reliability for the instrument. 
While there is some face validity to the questionnaire, some of the 
•items may give misleading results — especially for the psychology and 
statistics questionnaires. The popular conceptions of psychology and 
statistics tend to differ from the content of the programmed materials 
in the course, and familiarity with these popular concepts would not 
indicate knowledge of the program contents. Similarly, the specific 
contents covered by the titles of astronomy, computer programming, psy~ 
chology, and statistics can be quite varied and knowledge of one. type 
of content would not necessarily be related to the types covered in this 
study. 



Question 

1» Have you ever been taught in school anything 
about psychology? 

2. Have you ever read a book about psychology? 

3. Have you ever read a magazine article about 
psychology? 

* 

4. Is anyone you know a psychologist? 



Ach iev ament Tests 

K M H» il» »» ***** '*■*■*«■* 

Four achievement tests were developed as criteria for measuring the 
effectiveness of the instructional material. The tests are identical for 
each version of the material within each content area and are in multiple 
choice forms. The tests cover only the material presented in Part II and 

-* . 
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not Part X. Rather than preparing a complete table of specifications for 
each set of material, the nature of the programmed instructional material 
being in discrete frames permitted a random sampling of material to be 
tested. This method was more efficient than preparing a table of speci- 
fications and it probably more validly represents the material. In order 
to equally represent each version of the material, 100 of the orignal 
frames were randomly selected from each Part II of the materials — one 
frame from each page. This represents 1/8 of the total number of frames. 
Then a multiple-choice item was written to reflect the content of each 
of the selected frames. Frames for which test items were written were 
not screened to eliminate trivial content. 

Since each test was to be given within a sixty-four-minute period, 
only 64 questions were included in the final versions of the tests. The 
64 questions represent 8% of the 800 frames in each Part II and are allo- 
cated in such a way that they also represent 8% of the response frames 
(terminal frames in each unit) for each version of the material. That is 
regardless of version, 8% of the frames which terminate a unit and which 
solicit a response are represented by a test question (this distribution 
will be different following item analysis),* Table 3-1 illustrates the 
numerical allocation of these questions. Additionally, each of the eight 
paginal frame positions is equally represented by eight test items. 



Table 3-1 

Numerical Allocation of Achievement Test Questions 
By Response Frames Across Versions 



Version 


No. of Frames 
Per Unit 

i 


Total No. of 
Units and 
Response Frames 


Test Frames ■ : ' 

Coinciding with i 

Response Frames 
Number _ Percentage 


1 


1 


800 


64 


8.0 


2 


4 


200 


16 


8.0 


3 


16 


50 


4 


8.0 


4 


32 


25 


2 


8.0 



The selection of the questions in the! final version Of each test 
was based upon the following criteria s 

1. Each of the responding versions is equally represented as 
described above. 



2* Questions subjectively judged to be ambiguous or otherwise of 
poor quality were not used. 

3. Questions were physically distributed as much as possible across 
the material for each of the responding frames* 

Where these criteria were not adequate to choose between two or 
more questions, the choice was made randomly. 

- The resultant 64 questions for each. test Were then randomly ordered 
for the final version to eliminate the existence of any temporal ordering 
effect in terms of the amount of time which had passed since that part of 
the material was studied. 

While all four tests were developed in the above manner, the initial 
development of the psychology test had taken place during a prior study 
(Flynn, 1968). At that time, nine test items were removed by item analysis 
as failing to discriminate and the resultant test had an internal consis- 
tency of .94 (Kuder-Richardson Formula 20). For the present study, each 
of these nine items was either replaced by a substitute item or it was 
rewritten. This revised instrument was designated as Form B- of the Psy- 
chology Achievement Test, 

The items in the four tests were analyzed to determine their ability 
to discriminate based upon the pooled responses of the students across 
treatment groups. In the previous development of the psychology test, the 
discrimination index used was the difference in the proportion of students 
in the upper and lower 27% of the sample getting each item correct (Flynn, 
1968). However, in the present study the point-biserial correlation be- 
tween the item and the total score was employed. Guertin (1965) discussed 
the problems associated with using the total test score as the criterion 
for item analysis and reported a computer program which periodically up- 
dated the total score as the items with the lowest correlations were 
eliminated. A program patterned after Guertin* s was developed and used 
in the present study. This program differed from Guertin’s in two main 
respects: First point-biserial correlation was used Instead of blserlal 

and second, the total score was revised after each item (or group of 
items with the same correlation) was eliminated rather than after 5% of 
the items were eliminated. As with Guertin’s program, the rejected items 
were analyzed in a second pass to determine if they form an identifiable 
second dimension of the test. In both analyses for the four tests, items 
were dropped which had an absolute correlation of .25 or below with the 
total. The value of .25 represents the approximate .05 significance 
level for each of the four samples (although only one sample was used 
with repeated measures, some students did not complete all four tests 
resulting in a different N for each test) . * 

Tables 3-2, 3-3, 3-4, and 3-5 show the results of the item analyses. 
Hie proportion of Ss getting each Item correct is also reported although 
this was not used as a criterion for rejecting items since the discrimina- 
tion criterion rejects those items which are too easy or too difficult. 

The relatively high number of items rejected in each test may be in part 
due to the fact that the tests were administered four weeks after the 
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Table 3~2 

Items Retained Following Item 
Analysis of the Astronomy Achievement Test 



’ T r " ' Correlation "* Correlation 

Percent ’ Percent 



Item 


Correct 


I 


II 


Item 


Correct 


I 


II 


1 


94 






33 


50 


.33 




2 




.36 




34 


71 


.32 




3 


>40 


.37 




35 


75 


.37 




4 


30 






36 


67 


.45 




5 ' 


67 


- — 


• 45 


37 


59 


.62 




6 


80 ^ ^ 


' .42 




T 38 


■ ’ 36 V1 


.48 




7 


. 44 


— - 


.55 


39 


51 


.35 




8 


55 


.41 




40 


34 


• 




9 


32 


.37 




41 


" 28 


.36 




10 


15 


mmmmmsm 




42 


44 


.57 




11 


53 


v43 




43 


44 


— 


.35 


12 


63 


.33 




44 


32 


.34 




13 


48 


— 




45 


40 


. 


.37 


14 


69 


,57 




46 


36 


— 


.42 


15 


17 


M* *£*««* 




47 


65 


— 




16 


57 


" .53 




48 


46 


.43 




17 


57 


.29 




49 


0 


• — 




18 


63 


.39 




50 


48 


.34 




19 


76 


.42 




51 


55 


.37 


» 


20 


59 


.41 




52 


59 


.62 




21 


53 


.58 




53 


5 


— - 




22 


75 


.46 




54 


59 


.36 




23 


32 


MM 




55 


76 


.37 




24 


26 


— .* 


.39 ' 


56 


61 


.32 




25 


50 


.28 




57 


71 


.41 




26 


50 


.59 




58 


40 


.37 




27 


32 


.32 




59 


69 


.31 




28 


17 


MMlM 


.36 


60 


69 


.33 




29 


50 


— 


.40 


61 


30 


— 


.41 


30 


61 


rn.rn.m0 


.49 


62 


. 69 


.41 




31 


50 


.42 




63 


5 


— 




32 


21 


.30 




64 


53 


.39 
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Table 3-3 



Items Retained Following Item 
Analysis of the Computer Programming Achievement Test 







Correlation 






Correlation 


* 


Item 


Percent 








Percent 


■ 




0 P 


Correct 


I 


II 


Item 


Correct 


I 


II 


- 


1 


36 




.26 


33 


10 






- • 


2 


32 


.47 




34 


. 40 


.35 




* 


3 


32 


.45 




3b 


34 


.41 




* 


4 


38 


.37 




36 


34 


• 46 






5 


36 


- .45 




37 


44 


.30 




t 

* 


6 


40 


.37 




38 


36 


.45 






7 


26 


.39 


( 


39 


22 








8 


44 


.31 




40 


53 


.31 




• V 


9 


34 


.45 




41 


57 


.34 






10 


51 


_ .39 




42 


22 


.41 


* * 


11 


48 


.39 




43 


53 


.41 




\ 


12 


32 


.32 




44 


18 


.45 




4 


13 


42 


.37 




45 


32 


.37 






14 


30 


.33 




46 


16 


.28 


i ’ 

\ . 

* . 


15 • 


0 


— 




47 


. 34 




16 


42 


.35 




48 


32 




• 41 * 


> 


17 


42 


— 


.42 


49 


38 


.50 


t * *. 

* 


18 


28 


*50 




50 


48 




i. ; 


19 


55 


- — 


*48 


51 


16 


.35 






20 


40 


.31 




52 


14 


.41 




Of *■ 


21 


0 


- — 




53 


30 


.41 




1 


22 


28 


.56 




54 


32 


.56 




1 * 


23 


* 42 


.58 




55 


36 


.50 




•f ** • 


24 


51 


.34 




56 


32 


.43 




i . . i 

i. j 


25 


26 


.27 




57 


32 


.40 




26 


36 


.32 




58 


10 








27 


24 




.57 


59 


30 


.* 


.28 


♦ • » 

i - 

L 


28 


40 


.27 




60 


36 


.32 


29 


51 


.42 




61 


18 




.34 


* 


30 


26 


— - 




62 


44 


.42 


¥ 

[ 

i • 


31 

32 


2 

59 




.37 


63 

64 


51 

38 


.40 

.47 






* 
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Table 3-4 



Items Retained Following Item 
Analysis of the Psychology Achievement Test 



Correlation Correlation 



Percent _ Percent 



Item 


Correct 


I 


II 


Item 


Correct 


I 


II 


1 


43 


.36 


* i 


33 


52 


.48 




2 


34 


.53 




34 


13 




.31 


3 


47 


.26 




35 


50 


.58 




4 


13 






36 


21 


.41 




5 


47 


.61 




37 


50 


.41 




6 


28 


755" 




38 


30 


— 


.37 


7 


8* 






39 


39 


.50 




8 


23 


— - 




40 


41 


.61 




9 


73 


.40 




41 


50 


.35 




10 


32 


(p» mmmrn 




42 


17 


— - 




11 


34 


.37 




43 


32 


.50 




12 


28 






44 


41 


.35 




13 


34 


.35 




45 


69 


- — 


.52 


14 


41 


.41 




46 


54 


.43 


15 


54 


■IMM 


.31 


47 


50 




16 


23 




.46 


48 


45 


.37 


i 


17 


34 






49 


34 


— 


.33 


18 


58 


«... 


.28 


50 


23 


mmmdmm 




19 


56 


• 46 




51 


45 


.58 




20 


34 


w - 


.49 


52 


43 


.41 




21 


30 


.33 




53 


39 


'.53 




22 


50 


— Ill 


.36 


54 


36 


.29 




23 


47 


.49 




55 


65 


11 nil Ml 


.31 


24 


54 


.60 




56 


41 


— - 


25 


34 




.45 


57 


6 


— 




26 


58 


.43 




58 


47 


.37 




27 


19 




.29 


59 


39 


.28 




28 


34 


.42 




60 


50 


.74 




29 


47 


.41 




61 


47 


.55 




30 


19 


- — 


.34 


62 


17 


lyn i*i 




3l " 


21 


r 


73T" 


63 


17 






32 


41 


— 




64 


50 


.57 
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Table 3-5 



Items Retained Following Item 
Analysis of the Statistics Achievement Test 



Correlation Correlation 

Percent * Percent 



Item 


Correct 


I 


II 


Item 


Correct 


l 


II 


1 


50 


.... 




33 


62 


.61 




2 


12 


— • 


.34 


34 


8 


— 




3 


42 


.54 




35 


16 


.27 




4 


68 


.31 




36 


16 


. — 




5 


24 


.36 


, 


37 


30 


.34 




6 


58 


.33 




38 


24 


.37 




7 


26 




.27 


39 


44 


.40 




8 


44 


.35 




40 


52 


.34 




9 


38 


.28 




41 


50 


.36 




10 


62 


.31 




42 


52 


.43 




11 


30 


.31 




43 


48 


.61 




12 


26 


— wr** 


.39 


44 


50 


— 




13 


32 


.42 




45 


22 


— 


.33 


14 


28 


- — 




46 


62 


.47 




15 


2 


— 




47 


34 




.27 


16 


16 




;30 


48 


28 


.39 




17 


32 


.30 




49 


10 


. — 


* 


18 


46 


.42 




50 


66 


.54 




19 


44 


.47 




51 


24 


.28 




20 


34 


.34 




52 


44 


.53 




21 


56 


. .36 




53 


42 


... 


.28 


22 


40 




.43 


54 


24 


... 


.28. 


23 


34 


.48 




55 


28 


... 


.31 


24 


32 


.53 




56 


60 


. — 


.37 


25 


6 


— - 


.33 


57 


38 


.35 




26 


64 


.42 




58 


14 


.33 




27 


24 


... 


• 33 


59 


72 


.59 


• 


28 


38 


- — 


•26 


60 


62 


.27 




29 


26 


... 




61 


30 


.30 




30 


34 


.52 




62 


30 


— 




31 


60 


.49 




63 


50 


— 


,4o 


32 


26 


.32 




64 


0 
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students completed the relevant materials. Fewer Items would have been 
rejected had the tests been given Immediately, The delayed administra- 
tion may also explain why a large number of items were rejected from 
the psychology test when it had been previously subjected to an item 
analysis • 

9 

After the item analyses, the internal consistencies of the teste 
were calculated using the Kuder-Richardson formula 




where 

r is the reliability coefficient, 

k is the number of test items , 

# * 

pi is the proportion of Ss getting item 1 correct, 

•* 

qi is 1-p, and 

a x 2 is the variance of the test scores* 



The obtained r*s are reported in Table 3-6* The consistently low r*s 
for the second dimension on each test is probably due to the small num- 
ber of items contained in these sub-tests* The number of items on these 



Table 3-6 

-* » 

Kuder-Richardson Coefficients for the Two 
Dimensions of the Four Achievement Tests 



Computer 

Astronomy Programming Psychology Statistics 
1 II I II I II I II 



N 


52 


52 


49 


49 


46 


46 


50 


50 * 


No. of Items 


A3 


10 


45 


10 


40 


14 


39 


15 


Mean 


23.8 


4.2 


17.0 


3.4 


16.7 


4.9 


16.8 4.5 


S,D. 


8.4 


2.0 


8.6 


1.7 


8.2 


2,4 


7.6 


2.1 


r 


*88 


.50 


*89 


.38 


.87 


.52 


.88 


.41 



vary from 10 to 15 c Because of the low reliabilities of the second 
dimensions of the tests (they range from .38 to .52), they can only be 
utilised for mean comparisons among groups. The variability is too 
great to use these scores as criterion variables in Individual predic- 
tion* The coefficients obtained for the first and major dimension of 
each test are all in the high *80* a and era considered quite adequate. 



Due to the method by which the tests were constructed, they can 
be expected to have relatively high content validity and no attempt 
was made to establish criterion-related validity* Each question was 
written to reflect the content of a specific frame and the internal 
consistency is reasonably high* 

Attitude Scales 

Four instruments were developed to measure student's attitudes 
toward the materials and the study following the completion of each 
program. Each instrument consists of nine 5-point Likert: scaled 
items plus space for any additional comments the students wished to 
make. The instruments are identical for each content area except 
for changing the name of the area to the appropriate one. 

Initially, it had been planned to create one composite question- 
naire which would contain items comparing the different contents and 
versions. This was not done, however, because such a composite ques- 
tionnaire would have to be given after the student finished all four 
programs and the amount of time lapsed between the completion of the 
material and the completion of the instrument would vary greatly within 
and between treatment, groups . 

In each content area, three factors were hypothesized. These are 
attitudes toward the specific content, attitudes toward the method of 
instruction, and attitudes toward the instructional situation. These 
are labelled Subject, Method, and General, respectively. The nine 
items were predicted to load on the factors as shown below: 

ITEM FACTOR 



(1) I would like to read more about astronomy. Subject 

(2) I would rather read a regular textbook on astronomy 

than have . taught by the materials I have just 
completed. Method 

(3) I did not like the way astronomy was taught in the 

materials I have just completed. Method 

(4) I would like to become an astronomer. Subject 

(5) The booklets had a lot of mistakes in them. General 

(6) The printing in the booklets was difficult to read. General 



(7) There was too much confusion in the classroom while 

I worked . General 

v 

(8) I learned a lot about astronomy from reading 

the materials. Subject 

(9) I would like to see these booklets become part Subject 

of the regular course work at Nova High School. Method 



i 







While the above Items are from the astronomy questionnaire, the corres- 
ponding items on the other questionnaires were predicted to load in the 
same way. Item 9 could be interpreted by the student as referring to 
either the subject or the content or both and could load on either fac- 
tor. To a lesser extent, some of the other items might load on more 
than one factor. For example, Item 8, "I learned a lot about astronomy 
from reading the materials” could be answered either in terms of the 
student's reaction to astronomy or In terms of his reaction to the 
method. Similarly, elements of Items 2 and 3 could be interpreted as 
subject related in addition to being method related. 

Each questionnaire was factor analyzed using a principle compo- 
nents program developed by International Business Machines, Inc. In 
all four cases, only three factors had eigenvalues greater than 1.0. 
These three factors accounted for 61% of the variance for astronomy, 

62% for computer programming, 66% for psychology and 66% for statis- 
tics. The three factors for each instrument were rotated to varimax 
criterion. The resultant loadings with absolute values equal to or 
greater than |.35J are reported in Tables 3-7, 3-8, 3-9, and 3-10. In 
each table, the predicted items for each- factor are indicated by boxes. 



Table 3-7 



Factor Strrcture of Astronomy Questionnaire 
Items and the Relationship to the Hypothesized Structure* 



• Item 




Factor 




General 


Subject 


Method 


1 




fTsrj 


i 


2 


.59 




FT38| 


3 ’ •; 


• 




FIST) 


4 




m n 


• 


5 


r^n 


* * 
> » 




6 


[■778*1 






7 


LM J 






8 




■ 1=3 


.82 


9 




C3D 


.76 



•Boxes In table indicate where items were hypothesised to load. Only 
loadings equal to or greater than |.35|are reported. 



Table 3-8 



Factor Structure of Computer Programming Questionnaire 
Items and the Relationship to the Hypothesized Structure* 



Factor 



Item General Subject Method 



1 

2 


• 9 


c m . 


■ E=l 


3 . 




-.50 


F=l 


4 




GM 

«> 




5 


PD 


-.38 




6 


CUD 






7 

8 


1=1 


1=1 


1 —.55 i 


9 

1 




I- 66 1 


nm 

• i 

♦ 



*Boxes in table indicate where items were hypothesized to load. Only 
loadings equal to or greater than |.35|are reported* 
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Table 3-9 



Factor Structure of Psychology Questionnaire 
Items arid the Relationship to the Hypothesized Structure® 



Factor 



Item General Subject Method 



1 


•A6 


17^1 




2 




.58 


EE1 


3 




/ 


ReTl 


4 




f* 

17931 ... 

1 . , 


- 


5 






6 


b54j 


1 




7 




1 


-.60 


8 


.72 


□ 




9 


• 71 


1 1 


I. 35 J 



aBoxes in table indicate where items were hypothesized to load* Only 
loadings equal to or greater than |«35| are reported* 



Table 3-10 



Factor Structure of Statistics Questionnaire 
Items and the Relationship to the Hypothesized Structure^ 



Item 


** 


Factor 




General 


Subject 


Method 


1 


i 




.39 


2 




.76 


rb 


3 








4 


1 

« . 


QD 


:nm 


5 


:pBo| 


* 




6 


nw 


» 




7 


G22 




.36 


8 






.79 


9 

4 


4 




ph 

» 

» 



•Boxes in table indicate where items were hypothesized to load. Only 
loadings equal to or greater than |*35| are reported. 











As can be seen from the tables , there is high agreement between 
the obtained factors and the hypothesized factors. This agreement 
tends to validate the instrument, which in turn indicates good relia- 
bility. The instruments were not examined for reliability and valid- 
ity beyond the factor analyses. In addition to the agreement between 
the hypothesized factors and the obtained ones, the scales appear to 
have face validity. 

» * 

The instruments were scored on the basis of the hypothesized 
factors rather than the obtained ones. This was done because of the 
high agreement between the two sets and because comparable scores 
across instruments are desirable for interpretation. Thus, the ratings 
on each scale are weighted by 1 or 0 for each of the three factors 
depending upon its hypothesized Inclusion or exclusion from that factor. 
Because of this method of scoring, the scores on the three factors will 
not be orthogonal to each other. 



Other Data Collection Forms 

In addition to the structured instruments described above, three 
other data collection forms were prepared. These are the Information 
Sheet, the Student Record Sheet, and the Non-Participation Response 
Sheet. Copies of . these are also included in Appendix, C. 

The Information Sheet was filled out by the students on the first 
day of class. It includes spaces for their name, student number. Ad 
Com Room (i.e., homeroom), Ad Com teacher, grade, date of birth, num- 
ber of years at Nova, and names of all courses in which they were 
currently enrolled. 

The Student Record Sheet was used with the programmed materials 
for the student to record the book and version of materials he was 
working with the beginning and ending dates of both Part I and Part II 
and the number of errors made in both parts. 

The Non-Participation Response Sheet was filled out by each student 
who formally dropped out of the Nova R project. It asked why they were 
not participating, if they would participate under other circumstances, 
and if they thought research should be conducted at Nova. It also 
solicited additional consents. 



Chapter IV 

ATTRITION AND THE RESULTANT SAMPLE 



In relation to the goals of the study, the causes and effects of 
attrition warrant close examination for two important reasons. First 
it is important to ascertain if the attrition has affected the random 
^ assignment to treatment groups so as to invalidate or made uninterpret— 
able the results obtained. Second, the causes of attrition may indi- 
rectly aid in attaining the objective of the study concerned with identi- 
fying characteristics of learners who succeed with programmed instruc- 
tion. If some of the attrition is due to the learner's incompatibility 
with programmed materials, then it is important to identify the charac- 
teristics of such learners. 



Distribution and Randomness of Attrition 

For the purposes of analysis, three general categories of non- 
participants can be identified. These are (1) students who were not 
actually enrolled in school at the time of the study although they 
were on the class rolls, (2) students who showed up for the study but 
who dropped out before beginning work on the programmed materials, and 
(3) students who dropped out after beginning work on the programmed 
materials. Distinctions among these three categories are important in 
examining the attrition problem. Students in the first category can be 
considered as non-entities who have no effect on the sample or the study 
other than reducing the size of the original N. Students in the second 
category can be expected to alter the characteristics of the remaining 
8 ample and reduce its size by dropping out, but their absence should 
not generally influence the differences among treatment groups nor should 
their dropping out be considered as indications of negative reactions 
to the materials . 

The attrition in the last category, however, is potentially a more 
serious problem. These students may have dropped out because of their 
experiences with the materials. Since the students in each treatment group 
started with a different set of materials it is possible that some treatment 
groups suffered greater attrition than others. If the attrition occurred 
unevenly across treatment groups, the characteristics of the resultant sam- 
ple could vary greatly as a result. Thus, it is particularly important to 
study the nature and effect of attrition in the last category. 

» 

Table 4-1 shows the number of students in each of the three attrition 
categories as well as the number of students who finished. Also, the 
columns represent the treatment group to which the students were assigned. 
Students in Categories 1 and 2 who were assigned to treatment groups were 
unaware of their assignment and did not see the materials. The ten stu- 
dents of those not in school who were assigned to groups were assigned 



« 
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Distribution of Students by Completion Category and Treatment Group 











43 



a Five students who were assigned to the second term requested and were granted permission to 
participate in the first term. They are included in this table in the first term figures. 




on the possibility that they might arrive at a l&ter time. However, they 
did not. 

The information in the table is slightly misleading since the students 
in the second term were assigned to groups in order to fill them out and 
were not evenly distributed. Consequently, the figures cannot be directly 
pooled across terms for determination of randomness of attrition. In- 
stead, comparisons within terms are more interpretable. Within each term, 
then, a comparison can be made across treatment groups between those who 
saw the materials and dropped out and those who completed all the work. 
These comparisons will permit the determination of the randomness of attri- 
tion across groups. 

In the second term all of the students who started either dropped out 
before they completed any materials or they completed all of the materials. 
This was not true in the first term where ten students partially completed 
the materials (1 to 3 sets) . These ten could logically be included with 
either the dropouts or the non-dropouts ■ for comparison purposes. If the 
students in a treatment group Who dropped out before completing any mater- 
ials did so because of their experience with the first program in that 
sequence, then they may be different from the students who completed the 
first set and then quit later in the study. However,. the students in both 
groups did drop out and can be considered collectively as dropouts. The 
way in which they are considered in determining the randomness of attri- 
tion is indicated by the analyses to be performed. If only the criterion 
data from students who completed all the materials is to be analyzed to 
determine treatment effects, then the partial completers should be con- 
sidered as dropouts. If data from those who completed all and those who 
completed some materials is to be combined for analysis, than the partial 
completers should be considered as non-dropouts. Since both types of 
analyses are done, comparisons will be made both ways. , 

Tables 4-2, 4-3, and 4-4 show x 2 contingency tables for the two 
first-term comparisons and the second term compaij^son. The x 2 * 8 fo* 

Table 4-2 

First Term: Frequency of Students Starting Materials and 
Completing 0 Sets Versus Those Completing 1-4 Sets 







Treatment Group 






Sets Completed 


1 


2 


3 


4 


Total 


0 


3 


5 


2 


7 


17 


1-4 


12 


9 


8 


11 


40 


Total 


15 


14 


10 


18 


5 1 . 



X 2 “ 2.08, d.f. ■ 3, n.a. 



